Search CORE

23 research outputs found

Recommended from our members

Investigating ontology based query expansion using a probabilistic retrieval model

Author: Bhogal Jagdev
Publication venue
Publication date
Field of study

This research briefly outlines the problems of traditional information retrieval systems and discusses the different approaches to inferring context in document retrieval. By context we mean word disambiguation which is achieved by exploring the generalisation-specialisation hierarchies within a given ontology. Specifically, we examine the use of ontology based query expansion for defining query context. Query expansion can be done in many ways and in this work we consider the use of relevance feedback and pseudo-relevance feedback for query expansion. We examine relevance feedback and pseudo-relevance to ascertain the existence of performance differences between relevance feedback and pseudo-relevance feedback. The information retrieval system used is based on the probabilistic retrieval model and the query expansion method is extended using information from a news domain ontology. The aim of this project is to assess the impact of the use of the ontology on the query expansion results. Our results show that ontology based query expansion has resulted in a higher number of relevant documents being retrieved compared to the standard relevance feedback process. Overall, ontology based query expansion improves recall but does not produce any significant improvements for the precision results. Pseudo-relevance feedback has achieved better results than relevance feedback. We also found that reducing or increasing the relevance feedback parameters (number of terms or number of documents) does not correlate with the results. When comparing the effect of varying the number of terms parameter with the number of documents parameter, the former benefits the pseudo-relevance feedback results but the latter has an additional effect on the relevance feedback results. There are many factors which influence the success of ontology based query expansion. The thesis discusses these factors and gives some guidelines on using ontologies for the purpose of query expansion

City Research Online

An Ontological Analysis of the Role of Culture within Green IT

Author: Bhogal Jagdev
Campbell W.M.
Publication venue
Publication date: 27/03/2015
Field of study

Birmingham City University Open Access Repository

BCU Open Access

WhatsUp: An event resolution approach for co-occurring events in social media

Author: Adedoyin-Olowe Mariam
Bhogal Jagdev
Gaber Mohamed Medhat
Hettiarachchi Hansi
Publication venue: 'Elsevier BV'
Publication date: 07/01/2023
Field of study

The rapid growth of social media networks has resulted in the generation of a vast data amount, making it impractical to conduct manual analyses to extract newsworthy events. Thus, automated event detection mechanisms are invaluable to the community. However, a clear majority of the available approaches rely only on data statistics without considering linguistics. A few approaches involved linguistics, only to extract textual event details without the corresponding temporal details. Since linguistics define words’ structure and meaning, a severe information loss can happen without considering them. Targeting this limitation, we propose a novel method named WhatsUp to detect temporal and fine-grained textual event details, using linguistics captured by self-learned word embeddings and their hierarchical relationships and statistics captured by frequency-based measures. We evaluate our approach on recent social media data from two diverse domains and compare the performance with several state-of-the-art methods. Evaluations cover temporal and textual event aspects, and results show that WhatsUp notably outperforms state-of-the-art methods. We also analyse the efficiency, revealing that WhatsUp is sufficiently fast for (near) real-time detection. Further, the usage of unsupervised learning techniques, including self-learned embedding, makes our approach expandable to any language, platform and domain and provides capabilities to understand data-specific linguistics

Birmingham City University Open Access Repository

BCU Open Access

TTL: transformer-based two-phase transfer learning for cross-lingual news event detection

Author: Adedoyin-Olowe Mariam
Bhogal Jagdev
Gaber Mohamed Medhat
Hettiarachchi Hansi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/03/2023
Field of study

Today, we have access to a vast data amount, especially on the internet. Online news agencies play a vital role in this data generation, but most of their data is unstructured, requiring an enormous effort to extract important information. Thus, automated intelligent event detection mechanisms are invaluable to the community. In this research, we focus on identifying event details at the sentence and token levels from news articles, considering their fine granularity. Previous research has proposed various approaches ranging from traditional machine learning to deep learning, targeting event detection at these levels. Among these approaches, transformer-based approaches performed best, utilising transformers’ transferability and context awareness, and achieved state-of-the-art results. However, they considered sentence and token level tasks as separate tasks even though their interconnections can be utilised for mutual task improvements. To fill this gap, we propose a novel learning strategy named Two-phase Transfer Learning (TTL) based on transformers, which allows the model to utilise the knowledge from a task at a particular data granularity for another task at different data granularity, and evaluate its performance in sentence and token level event detection. Also, we empirically evaluate how the event detection performance can be improved for different languages (high- and low-resource), involving monolingual and multilingual pre-trained transformers and language-based learning strategies along with the proposed learning strategy. Our findings mainly indicate the effectiveness of multilingual models in low-resource language event detection. Also, TTL can further improve model performance, depending on the involved tasks’ learning order and their relatedness concerning final predictions

BCU Open Access

Embed2Detect: temporally clustered embedded words for event detection in social media

Author: Adedoyin-Olowe Mariam
Bhogal Jagdev
Gaber Mohamed Medhat
Hettiarachchi Hansi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/09/2020
Field of study

Social media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%

arXiv.org e-Print Archive

Birmingham City University Open Access Repository

BCU Open Access

Leap2Trend: A Temporal Word Embedding Approach for Instant Detection of Emerging Scientific Trends

Author: Azad R. Muhammad Atif
Bhogal Jagdev
Dridi Amna
Gaber Mohamed Medhat
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/11/2019
Field of study

Early detection of emerging research trends could potentially revolutionise the way research is done. For this reason, trend analysis has become an area of paramount importance in academia and industry. This is due to the significant implications for research funding and public policy. The literature presents several emerging approaches to detecting new research trends. Most of these approaches rely mainly on citation counting. While citations have been widely used as indicators of emerging research topics, they suffer from some limitations. For instance, citations can take months to years to progress and then to reveal trends. Furthermore, they fail to dig into paper content. To overcome this problem, we introduce Leap2Trend, a novel approach to instant detection of research trends. Leap2Trend relies on temporal word embeddings ( word2vec) to track the dynamics of similarities between pairs of keywords, their rankings and respective uprankings (ascents) over time. We applied Leap2Trend to two scientific corpora on different research areas, namely computer science and bioinformatics and we evaluated it against two gold standards Google Trends hits and Google Scholar citations. The obtained results reveal the effectiveness of our approach to detect trends with more than 80% accuracy and 90% precision in some cases. Such significant findings evidence the utility of our Leap2Trend approach for tracking and detecting emerging research trends instantly

Birmingham City University Open Access Repository

BCU Open Access

Vec2Dynamics: A Temporal Word Embedding Approach to Exploring the Dynamics of Scientific Keywords—Machine Learning as a Case Study

Author: Azad Raja Muhammad Atif
Bhogal Jagdev
Dridi Amna
Gaber Mohamed Medhat
Publication venue: 'MDPI AG'
Publication date: 21/02/2022
Field of study

The study of the dynamics or the progress of science has been widely explored with descriptive and statistical analyses. Also this study has attracted several computational approaches that are labelled together as the Computational History of Science, especially with the rise of data science and the development of increasingly powerful computers. Among these approaches, some works have studied dynamism in scientific literature by employing text analysis techniques that rely on topic models to study the dynamics of research topics. Unlike topic models that do not delve deeper into the content of scientific publications, for the first time, this paper uses temporal word embeddings to automatically track the dynamics of scientific keywords over time. To this end, we propose Vec2Dynamics, a neural-based computational history approach that reports stability of k-nearest neighbors of scientific keywords over time; the stability indicates whether the keywords are taking new neighborhood due to evolution of scientific literature. To evaluate how Vec2Dynamics models such relationships in the domain of Machine Learning (ML), we constructed scientific corpora from the papers published in the Neural Information Processing Systems (NIPS; actually abbreviated NeurIPS) conference between 1987 and 2016. The descriptive analysis that we performed in this paper verify the efficacy of our proposed approach. In fact, we found a generally strong consistency between the obtained results and the Machine Learning timeline

Birmingham City University Open Access Repository

BCU Open Access

DeepHist: Towards a Deep Learning-based Computational History of Trends in the NIPS

Author: Azad R. Muhammad Atif
Bhogal Jagdev
Dridi Amna
Gaber Mohamed Medhat
Publication venue
Publication date: 30/09/2019
Field of study

Research in analysis of big scholarly data has increased in the recent past and it aims to understand research dynamics and forecast research trends. The ultimate objective in this research is to design and implement novel and scalable methods for extracting knowledge and computational history. While citations are highly used to identify emerging/rising research topics, they can take months or even years to stabilise enough to reveal research trends. Consequently, it is necessary to develop faster yet accurate methods for trend analysis and computational history that dig into content and semantics of an article. Therefore, this paper aims to conduct a fine-grained content analysis of scientific corpora from the domain of {\it Machine Learning}. This analysis uses {DeepHist, a deep learning-based computational history approach; the approach relies on a dynamic word embedding that aims to represent words with low-dimensional vectors computed by deep neural networks. The scientific corpora come from 5991 publications from Neural Information Processing Systems (NIPS) conference between 1987 and 2015 which are divided into six

5

-year timespans. The analysis of these corpora generates visualisations produced by applying t-distributed stochastic neighbor embedding (t-SNE) for dimensionality reduction. The qualitative and quantitative study reported here reveals the evolution of the prominent Machine Learning keywords; this evolution supports the popularity of current research topics in the field. This support is evident given how well the popularity of the detected keywords correlates with the citation counts received by their corresponding papers: Spearman's positive correlation is 100%. With such a strong result, this work evidences the utility of deep learning techniques for determining the computational history of science

Crossref

Birmingham City University Open Access Repository

BCU Open Access

Diabetes Disease Prediction System using HNB classifier based on Discretization Method

Author: Ali Mohammed
Alsewari AbdulRahman
Bassam Abdo Al-Hameli
Basurra Shadi
Bhogal Jagdev
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 23/02/2023
Field of study

Diagnosing diabetes early is critical as it helps patients live with the disease in a healthy way - through healthy eating, taking appropriate medical doses, and making patients more vigilant in their movements/activities to avoid wounds that are difficult to heal for diabetic patients. Data mining techniques are typically used to detect diabetes with high confidence to avoid misdiagnoses with other chronic diseases whose symptoms are similar to diabetes. Hidden Naïve Bayes is one of the algorithms for classification, which works under a data-mining model based on the assumption of conditional independence of the traditional Naïve Bayes. The results from this research study, which was conducted on the Pima Indian Diabetes (PID) dataset collection, show that the prediction accuracy of the HNB classifier achieved 82%. As a result, the discretization method increases the performance and accuracy of the HNB classifier

Birmingham City University Open Access Repository

BCU Open Access

UMP Institutional Repository

TED-S: Twitter Event Data in Sports and Politics with Aggregated Sentiments

Author: Doaa Al-Turkey
Hansi Hettiarachchi
Jagdev Bhogal
Mariam Adedoyin-Olowe
Mohamed Medhat Gaber
Publication venue: 'MDPI AG'
Publication date: 30/06/2022
Field of study

Even though social media contain rich information on events and public opinions, it is impractical to manually filter this information due to data’s vast generation and dynamicity. Thus, automated extraction mechanisms are invaluable to the community. We need real data with ground truth labels to build/evaluate such systems. Still, to the best of our knowledge, no available social media dataset covers continuous periods with event and sentiment labels together except for events or sentiments. Datasets without time gaps are huge due to high data generation and require extensive effort for manual labelling. Different approaches, ranging from unsupervised to supervised, have been proposed by previous research targeting such datasets. However, their generic nature mainly fails to capture event-specific sentiment expressions, making them inappropriate for labelling event sentiments. Filling this gap, we propose a novel data annotation approach in this paper involving several neural networks. Our approach outperforms the commonly used sentiment annotation models such as VADER and TextBlob. Also, it generates probability values for all sentiment categories besides providing a single category per tweet, supporting aggregated sentiment analyses. Using this approach, we annotate and release a dataset named TED-S, covering two diverse domains, sports and politics. TED-S has complete subsets of Twitter data streams with both sub-event and sentiment labels, providing the ability to support event sentiment-based research

Multidisciplinary Digital Publishing Institute

Birmingham City University Open Access Repository

BCU Open Access